Overview

Dataset statistics

Number of variables8
Number of observations284
Missing cells0
Missing cells (%)0.0%
Duplicate rows10
Duplicate rows (%)3.5%
Total size in memory17.9 KiB
Average record size in memory64.5 B

Variable types

Numeric8

Alerts

Dataset has 10 (3.5%) duplicate rowsDuplicates
Cement is highly overall correlated with Blast Furnace Slag _component_2 and 5 other fieldsHigh correlation
Blast Furnace Slag _component_2 is highly overall correlated with Cement and 5 other fieldsHigh correlation
Fly Ash _component_3 is highly overall correlated with Cement and 5 other fieldsHigh correlation
Water_component_4 is highly overall correlated with Cement and 5 other fieldsHigh correlation
Superplasticizer_component_5 is highly overall correlated with Cement and 5 other fieldsHigh correlation
Coarse Aggregate_component_6 is highly overall correlated with Cement and 5 other fieldsHigh correlation
Fine Aggregate_component_7 is highly overall correlated with Cement and 5 other fieldsHigh correlation
Age_day is highly overall correlated with Water_component_4High correlation
Blast Furnace Slag _component_2 has 105 (37.0%) zerosZeros
Fly Ash _component_3 has 184 (64.8%) zerosZeros
Superplasticizer_component_5 has 68 (23.9%) zerosZeros

Reproduction

Analysis started2023-03-02 18:01:50.477320
Analysis finished2023-03-02 18:02:00.846137
Duration10.37 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

Cement
Real number (ℝ)

Distinct50
Distinct (%)17.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean310.30106
Minimum139.6
Maximum540
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:00.925928image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum139.6
5-th percentile168
Q1213.8
median313.3
Q3380
95-th percentile475
Maximum540
Range400.4
Interquartile range (IQR)166.2

Descriptive statistics

Standard deviation99.981125
Coefficient of variation (CV)0.32220685
Kurtosis-1.0229998
Mean310.30106
Median Absolute Deviation (MAD)83.45
Skewness0.25877377
Sum88125.5
Variance9996.2254
MonotonicityNot monotonic
2023-03-03T00:02:01.060564image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
362.6 20
 
7.0%
425 20
 
7.0%
475 11
 
3.9%
251.4 10
 
3.5%
380 10
 
3.5%
139.6 6
 
2.1%
237.5 6
 
2.1%
332.5 6
 
2.1%
198.6 6
 
2.1%
427.5 6
 
2.1%
Other values (40) 183
64.4%
ValueCountFrequency (%)
139.6 6
2.1%
166.1 5
1.8%
168 5
1.8%
190 5
1.8%
190.3 5
1.8%
190.7 5
1.8%
194.7 5
1.8%
198.6 6
2.1%
212 5
1.8%
212.1 5
1.8%
ValueCountFrequency (%)
540 2
 
0.7%
531.3 5
 
1.8%
485 1
 
0.4%
475 11
3.9%
469 5
 
1.8%
439 5
 
1.8%
427.5 6
 
2.1%
425 20
7.0%
401.8 5
 
1.8%
389.9 5
 
1.8%

Blast Furnace Slag _component_2
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct27
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91.301056
Minimum0
Maximum282.8
Zeros105
Zeros (%)37.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:01.273995image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median97.1
Q3180
95-th percentile233.75
Maximum282.8
Range282.8
Interquartile range (IQR)180

Descriptive statistics

Standard deviation84.654521
Coefficient of variation (CV)0.92720199
Kurtosis-1.1486138
Mean91.301056
Median Absolute Deviation (MAD)92.1
Skewness0.34472113
Sum25929.5
Variance7166.3879
MonotonicityNot monotonic
2023-03-03T00:02:01.363752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
0 105
37.0%
189 30
 
10.6%
106.3 20
 
7.0%
98.1 10
 
3.5%
47.5 6
 
2.1%
209.4 6
 
2.1%
237.5 6
 
2.1%
132.4 6
 
2.1%
142.5 6
 
2.1%
95 6
 
2.1%
Other values (17) 83
29.2%
ValueCountFrequency (%)
0 105
37.0%
38 4
 
1.4%
42.1 5
 
1.8%
47.5 6
 
2.1%
76 5
 
1.8%
93.8 5
 
1.8%
94.7 5
 
1.8%
95 6
 
2.1%
97.1 5
 
1.8%
98.1 10
 
3.5%
ValueCountFrequency (%)
282.8 4
 
1.4%
262.2 5
 
1.8%
237.5 6
 
2.1%
212.5 5
 
1.8%
209.4 6
 
2.1%
200.9 5
 
1.8%
190 5
 
1.8%
189.2 5
 
1.8%
189 30
10.6%
177 5
 
1.8%

Fly Ash _component_3
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.855634
Minimum0
Maximum163.8
Zeros184
Zeros (%)64.8%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:01.456505image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q396.7
95-th percentile125.37
Maximum163.8
Range163.8
Interquartile range (IQR)96.7

Descriptive statistics

Standard deviation55.186615
Coefficient of variation (CV)1.4578178
Kurtosis-0.88374247
Mean37.855634
Median Absolute Deviation (MAD)0
Skewness0.92242749
Sum10751
Variance3045.5625
MonotonicityNot monotonic
2023-03-03T00:02:01.541280image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0 184
64.8%
118.3 15
 
5.3%
121.6 10
 
3.5%
24.5 10
 
3.5%
100.4 10
 
3.5%
96.7 5
 
1.8%
94.6 5
 
1.8%
100.5 5
 
1.8%
125.4 5
 
1.8%
125.2 5
 
1.8%
Other values (6) 30
 
10.6%
ValueCountFrequency (%)
0 184
64.8%
24.5 10
 
3.5%
94.1 5
 
1.8%
94.6 5
 
1.8%
95.7 5
 
1.8%
96.7 5
 
1.8%
100.4 10
 
3.5%
100.5 5
 
1.8%
118.2 5
 
1.8%
118.3 15
 
5.3%
ValueCountFrequency (%)
163.8 5
 
1.8%
163.3 5
 
1.8%
125.4 5
 
1.8%
125.2 5
 
1.8%
124.8 5
 
1.8%
121.6 10
3.5%
118.3 15
5.3%
118.2 5
 
1.8%
100.5 5
 
1.8%
100.4 10
3.5%

Water_component_4
Real number (ℝ)

Distinct40
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.19965
Minimum121.8
Maximum228
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:01.648991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum121.8
5-th percentile138.4
Q1158.725
median175.5
Q3192
95-th percentile228
Maximum228
Range106.2
Interquartile range (IQR)33.275

Descriptive statistics

Standard deviation28.882649
Coefficient of variation (CV)0.16117581
Kurtosis-0.65859819
Mean179.19965
Median Absolute Deviation (MAD)16.5
Skewness0.38313739
Sum50892.7
Variance834.20739
MonotonicityNot monotonic
2023-03-03T00:02:01.763686image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
228 53
 
18.7%
164.9 20
 
7.0%
153.5 15
 
5.3%
192 14
 
4.9%
188.5 10
 
3.5%
181.7 10
 
3.5%
159.4 5
 
1.8%
159.3 5
 
1.8%
187.4 5
 
1.8%
186.7 5
 
1.8%
Other values (30) 142
50.0%
ValueCountFrequency (%)
121.8 5
 
1.8%
126.6 5
 
1.8%
137.8 5
 
1.8%
141.8 5
 
1.8%
144.7 5
 
1.8%
145.9 5
 
1.8%
146 1
 
0.4%
147.4 5
 
1.8%
151.4 5
 
1.8%
153.5 15
5.3%
ValueCountFrequency (%)
228 53
18.7%
197.9 5
 
1.8%
195.5 5
 
1.8%
195.2 5
 
1.8%
192 14
 
4.9%
189.3 5
 
1.8%
188.5 10
 
3.5%
187.4 5
 
1.8%
186.7 5
 
1.8%
186 5
 
1.8%

Superplasticizer_component_5
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct35
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.3704225
Minimum0
Maximum32.2
Zeros68
Zeros (%)23.9%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:01.874387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14.5
median7.5
Q311.6
95-th percentile23.19
Maximum32.2
Range32.2
Interquartile range (IQR)7.1

Descriptive statistics

Standard deviation7.2024219
Coefficient of variation (CV)0.86046098
Kurtosis1.3963984
Mean8.3704225
Median Absolute Deviation (MAD)4.1
Skewness1.0696682
Sum2377.2
Variance51.874882
MonotonicityNot monotonic
2023-03-03T00:02:01.982099image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
0 68
23.9%
11.6 20
 
7.0%
16.5 15
 
5.3%
6.7 10
 
3.5%
5.7 10
 
3.5%
4.6 10
 
3.5%
7.8 10
 
3.5%
4.5 10
 
3.5%
6.1 5
 
1.8%
9.5 5
 
1.8%
Other values (25) 121
42.6%
ValueCountFrequency (%)
0 68
23.9%
2.5 2
 
0.7%
4.5 10
 
3.5%
4.6 10
 
3.5%
5.5 5
 
1.8%
5.7 10
 
3.5%
5.8 5
 
1.8%
6.1 5
 
1.8%
6.4 5
 
1.8%
6.7 10
 
3.5%
ValueCountFrequency (%)
32.2 5
 
1.8%
28.2 5
 
1.8%
23.4 5
 
1.8%
22 5
 
1.8%
18.6 5
 
1.8%
16.5 15
5.3%
15.9 5
 
1.8%
14.3 5
 
1.8%
12.1 5
 
1.8%
11.6 20
7.0%
Distinct36
Distinct (%)12.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean966.67148
Minimum852.1
Maximum1134.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:02.092806image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum852.1
5-th percentile852.1
Q1932
median944.7
Q31028.4
95-th percentile1088.1
Maximum1134.3
Range282.2
Interquartile range (IQR)96.4

Descriptive statistics

Standard deviation74.500208
Coefficient of variation (CV)0.077068797
Kurtosis-0.79422454
Mean966.67148
Median Absolute Deviation (MAD)59.9
Skewness0.15870041
Sum274534.7
Variance5550.281
MonotonicityNot monotonic
2023-03-03T00:02:02.194523image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
932 53
18.7%
852.1 45
 
15.8%
944.7 30
 
10.6%
1028.4 10
 
3.5%
1047 7
 
2.5%
978.4 6
 
2.1%
1088.1 5
 
1.8%
1058.6 5
 
1.8%
1058.7 5
 
1.8%
1065.8 5
 
1.8%
Other values (26) 113
39.8%
ValueCountFrequency (%)
852.1 45
15.8%
884.9 5
 
1.8%
926.1 5
 
1.8%
932 53
18.7%
936 5
 
1.8%
942.7 4
 
1.4%
944.7 30
10.6%
946.8 5
 
1.8%
947 5
 
1.8%
949.9 5
 
1.8%
ValueCountFrequency (%)
1134.3 5
1.8%
1120 1
 
0.4%
1090 5
1.8%
1088.1 5
1.8%
1085.4 5
1.8%
1066 5
1.8%
1065.8 5
1.8%
1058.7 5
1.8%
1058.6 5
1.8%
1057.6 5
1.8%
Distinct40
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean771.42077
Minimum594
Maximum992.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:02.308227image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum594
5-th percentile594
Q1707.9
median780.1
Q3847.9
95-th percentile905.585
Maximum992.6
Range398.6
Interquartile range (IQR)140

Descriptive statistics

Standard deviation99.102353
Coefficient of variation (CV)0.12846731
Kurtosis-0.57195314
Mean771.42077
Median Absolute Deviation (MAD)72
Skewness-0.28134921
Sum219083.5
Variance9821.2763
MonotonicityNot monotonic
2023-03-03T00:02:02.422920image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
594 30
 
10.6%
755.8 30
 
10.6%
670 23
 
8.1%
887.1 15
 
5.3%
757.7 10
 
3.5%
803.7 10
 
3.5%
780.1 10
 
3.5%
806.9 7
 
2.5%
825.5 6
 
2.1%
785.5 5
 
1.8%
Other values (30) 138
48.6%
ValueCountFrequency (%)
594 30
10.6%
605 5
 
1.8%
611.8 5
 
1.8%
659.9 4
 
1.4%
670 23
8.1%
676 2
 
0.7%
707.9 5
 
1.8%
755.8 30
10.6%
756.7 5
 
1.8%
757.6 5
 
1.8%
ValueCountFrequency (%)
992.6 5
 
1.8%
925.7 5
 
1.8%
905.9 5
 
1.8%
903.8 5
 
1.8%
903.6 5
 
1.8%
893.7 5
 
1.8%
887.1 15
5.3%
880.4 5
 
1.8%
870.3 5
 
1.8%
861.2 5
 
1.8%

Age_day
Real number (ℝ)

Distinct12
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66.616197
Minimum3
Maximum365
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-03-03T00:02:02.526615image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q17
median28
Q391
95-th percentile270
Maximum365
Range362
Interquartile range (IQR)84

Descriptive statistics

Standard deviation86.406165
Coefficient of variation (CV)1.2970744
Kurtosis4.3911288
Mean66.616197
Median Absolute Deviation (MAD)25
Skewness2.1753938
Sum18919
Variance7466.0253
MonotonicityNot monotonic
2023-03-03T00:02:02.610391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
28 56
19.7%
3 47
16.5%
56 43
15.1%
7 30
10.6%
91 22
 
7.7%
14 20
 
7.0%
100 20
 
7.0%
90 12
 
4.2%
180 12
 
4.2%
270 10
 
3.5%
Other values (2) 12
 
4.2%
ValueCountFrequency (%)
3 47
16.5%
7 30
10.6%
14 20
 
7.0%
28 56
19.7%
56 43
15.1%
90 12
 
4.2%
91 22
 
7.7%
100 20
 
7.0%
180 12
 
4.2%
270 10
 
3.5%
ValueCountFrequency (%)
365 10
 
3.5%
360 2
 
0.7%
270 10
 
3.5%
180 12
 
4.2%
100 20
 
7.0%
91 22
 
7.7%
90 12
 
4.2%
56 43
15.1%
28 56
19.7%
14 20
 
7.0%

Interactions

2023-03-03T00:01:59.835763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:53.818670image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.619558image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.838444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.607455image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.364401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.247104image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.018042image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.937484image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:53.927379image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.171133image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.939177image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.706196image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.469122image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.344868image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.123787image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:02:00.029238image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.024121image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.263880image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.031926image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.798913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.574838image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.436626image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.221527image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:02:00.123991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.120862image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.358603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.126644image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.891695image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.674662image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.531344image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.322258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:02:00.218731image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.216606image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.451355image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.217430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.981455image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.770379image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.621132image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.420999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:02:00.315444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.316369image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.547223image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.314171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.075204image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.952921image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.716876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.523719image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:02:00.411217image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.413103image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.639975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.407989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.166931image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.047662image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.811594image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.624321image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:02:00.515937image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:54.519824image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:55.741704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:56.510686image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:57.268658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.150392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:58.923295image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-03T00:01:59.733031image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2023-03-03T00:02:02.701176image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2023-03-03T00:02:02.848783image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-03T00:02:02.999379image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-03T00:02:03.148978image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-03T00:02:03.297674image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-03T00:02:00.651658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-03T00:02:00.782307image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

CementBlast Furnace Slag _component_2Fly Ash _component_3Water_component_4Superplasticizer_component_5Coarse Aggregate_component_6Fine Aggregate_component_7Age_day
0540.00.00.0162.02.51040.0676.028
1540.00.00.0162.02.51055.0676.028
2332.5142.50.0228.00.0932.0594.0270
3332.5142.50.0228.00.0932.0594.0365
4198.6132.40.0192.00.0978.4825.5360
5266.0114.00.0228.00.0932.0670.090
6380.095.00.0228.00.0932.0594.0365
7380.095.00.0228.00.0932.0594.028
8266.0114.00.0228.00.0932.0670.028
9475.00.00.0228.00.0932.0594.028
CementBlast Furnace Slag _component_2Fly Ash _component_3Water_component_4Superplasticizer_component_5Coarse Aggregate_component_6Fine Aggregate_component_7Age_day
274251.40.0118.3188.55.81028.4757.73
275251.40.0118.3188.55.81028.4757.714
276251.40.0118.3188.55.81028.4757.728
277251.40.0118.3188.55.81028.4757.756
278251.40.0118.3188.55.81028.4757.7100
279251.40.0118.3188.56.41028.4757.73
280251.40.0118.3188.56.41028.4757.714
281251.40.0118.3188.56.41028.4757.728
282251.40.0118.3188.56.41028.4757.756
283251.40.0118.3188.56.41028.4757.7100

Duplicate rows

Most frequently occurring

CementBlast Furnace Slag _component_2Fly Ash _component_3Water_component_4Superplasticizer_component_5Coarse Aggregate_component_6Fine Aggregate_component_7Age_day# duplicates
0362.6189.00.0164.911.6944.7755.834
1362.6189.00.0164.911.6944.7755.874
2362.6189.00.0164.911.6944.7755.8284
3362.6189.00.0164.911.6944.7755.8564
4362.6189.00.0164.911.6944.7755.8914
5425.0106.30.0153.516.5852.1887.133
6425.0106.30.0153.516.5852.1887.173
7425.0106.30.0153.516.5852.1887.1283
8425.0106.30.0153.516.5852.1887.1563
9425.0106.30.0153.516.5852.1887.1913